Determining the number of clusters in the Straight K-means: Experimental comparison of eight options

نویسندگان

  • Mark Ming-Tso Chiang
  • Boris Mirkin
چکیده

The problem of determining “the right number of clusters” in K-Means has attracted considerable interest, especially in the recent years. However, to the authors’ knowledge, no experimental results of their comparison have been reported so far. This paper intends to present some results of such a comparison involving eight cluster selection options that represent four different approaches. The data are generated according to a Gaussian-mixture distribution with the clusters’ spread and sizes variant. Most consistent results are shown by the silhouette width based method by Kaufman and Rousseeuw (1990) and iKMeans by Mirkin (2005).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimum Ensemble Classification for Fully Polarimetric SAR Data Using Global-Local Classification Approach

In this paper, a proposed ensemble classification for fully polarimetric synthetic aperture radar (PolSAR) data using a global-local classification approach is presented. In the first step, to perform the global classification, the training feature space is divided into a specified number of clusters. In the next step to carry out the local classification over each of these clusters, which cont...

متن کامل

Comparison of Two Computational Microstructure Models for Predicting Effective Transverse Elastic Properties of Unidirectional Fiber Reinforced Composites

Characterization of properties of composites has attracted a great deal of attention towards exploring their applications in engineering. The purpose of this work is to study the difference of two computational microstructure models which are widely used for determining effective transverse elastic properties of unidirectional fiber reinforced composites. The first model based on the classic me...

متن کامل

Oil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)

Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Data Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach

Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006